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ABSTRACT 

The purpose of this research was to reinvestigate the 
accuracy of three item bias detection procedures: (1) Linn and 
Harnisch* spseudo-IRT(Z) method;_(2) Camillas chi-square technique; 
and (3) Angoff^s revised transformed item difficulty method. These 
methods are applied when the minority group sample size is too small 
to obtain stable estimates of item parameters « This study analyzed 
the data which included ten black slang items imbedded within a 
standardized vocabulary test . inorder to determine the best 
methodology, three statistics were calculated: a pointbiserial 
correlation .between an a pr^ori_bias index and the detected bias 
index associated with each method, intercorrelations among the bias 
measures . of _ three procedures, _ and the percentage of agreement between 
the a priori bias index and bias index based on each method. Results 
showed that (1) the chi -square technique is slightly more accurate 
than the pseudo-IRT(Z) method in detecting bias; 12) Angoff^s revised 
transformed item difficulty (TID) method is considerably worse; and 
(3) the chi-square procedure is highly correlated with the 
pseudo-IRT(Z) method. Appendices include item bias indices for ail 
items and all methods, item information £ or computing the item bias 
index of Angbff's revised TID method for white and black groups, 
estimates of item parameters based on the three-parameter logistic 
model for Linn-Harnischls method, and principal component analysis of 
the test item. ( Author/ JAZ) 
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ABITRACT 



The purpose of this research was to reinvestigate the 
accuracy of three item bias detection. procedures (Linn and 
Harnisch's pseudo-IRT ( Z ) method, Cami Ill's chi-square 
technique, and Angoff 's revised transformed item difficulty 
method) in the typical situation where the minority group 
has a small number of examinees. The current study analyzed 
the data which included ten black slang items imbedded 
within a standardized vocabulary test. In order to 
determine the best methodology among three procedures, this 
study calculated three statistics: a point biserial 
correlation between an a prior bias index and the detected 
bias index associated with each method, intercorrelat ions 
amonq the bias measures of three procedures, and the 
percentage of agreement between the a priori bias index and 
bias index based on each method. 

This study found that 1) the chi-square technique is 
slightly more accurate than the pseudo-IRT( Z ) method in 
detecting bias; 2) Angoff f s revised Tib method is 
considerably worse; and 3) the chi-square procedure is 
highly correlated with the pseudo-IRT( Z ) method. 

There are two reasons why the pseudo-IRT ( Z ) may be less 
accurate than the chi-square technique. One reason is that 
the estimates of item parameters for the pesudo-IRT ( Z ) 
procedure may be influenced by combining the minority group 
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With the majority group*. Another reason may be violation o£ 
test unidimensionality assumed by the pseudo-IRT( Z ) method. 
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BACKGROUND AND PURPOSE 

In the last two decades, gender and race differences in 
test outcomes have been special topics of interest in the 
field of education. Jensen's assertion (1969) that there 
was a difference of one standard deviation in intelligence 
between blacks and whites continues to cause concern even 
today. Walker's study (1984) used meta analysis to check 
the widely shared assumption that women fixate at stage 3 in 
moral reasoning, and men progress to stage 4. Prior to 
discussing the argument that one group is better than 
another, it is important to investigate the question of 
whether test items are biased against certain subgroups. 
Williams (1971) insisted that traditional educational and 
employment tests are oriented toward the white middle class. 
Faggen-Steckler, McCarthy, and Title (1974) found that 
considerable content bias exists even in standardized tests 
in terms of the number of noun and pronoun references with 
respect to gender. As an indication that bias is an 
important topic, the Spring, 1976 issue of the Journal of 
Educati onal Measurem ent was devoted entirely to bias in 
selection . 

Many psychometr icians have attempted to provide a 
concrete and clear definition of item bias since the late 
1960s. Cleary and Hilton (1968) defined bias as an 
interaction between item and group in terms of analysis of 
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variance. Angoff and Ford (1973) said that an Item Is 
considered biased! if the item difficulty index or P-value 
for one group is relatively higher or lower than that for 
another group, Scheuneman (±975, 1979) stated that an item 
is biased if, for all individuals having the same score on a 
homogeneous subtest containing the item, the proportion of 
individuals getting the item correct is different for 
various population subgroups being considered. A widely 
accepted definition is: An item is biased if individuals 
with eaua) ability, but from different groups> have unequal 
probability of answering the item correctly. 

Shepard (1982) categorized approaches for detecting 
item bias, e.g., judgmental review, statistical review, and 
posterior analysis. Schmeiser also (1982) classified three 
approaches to detect item bias; these are the judgmental 
method, statistical item bias method, and experimental 
design method. Various statistical methodologies have been 
proposed for detecting item bias: 

(1) analysis of variance (Cleary & Hilton, 1968); 

(2) distractor response analysis (Veale & Foreman, 1975); 

(3) transformed item difficulty methods (Angoff, 1982; 
Angoff & Ford, 1973; Rudner, Getson, & knight, 1980; 
Shepard, Camilll, & Williams, 1985); 

(4) chi-square methods (Camilli, 1979; Scheuneman, 1975, 
1979); 



2 



(5) item response theory methods (braba, 1978; burbvic, 
19 75; Levine, 1981; Levine, Wardrop, & Linn, 1982; Linn, 
Levine, Hasting, & Wardrop, 1981; Linn & Harnisch, 1981; 
Rudner,1977; Wright, Mead, & Draba, 1976); 

(6) logit model methods (Meiienbergh, 1982; van der Flier, 
Meiienbergh, ftder, & Wi3n, 1984) . 

these methodologies are different but are concerned with 
the same concept of bias. They produce somewhat different 
results because of theoretical and practical reasons. 
Therefore, many studies have been devoted to comparisons of 
these methods (Irbnsbn, 1977; irbnsbn a Subkovlak, 1979; 
Merz & Grossen, 1979; Rudner & Convey, 1978; Rudner, Getsbh, 
& knight, 1980; Shepard, Camilli, & Aver ill- 1981; Shepard, 
Camilli, & Williams, 1985; Subkoviak, Mack, Ironson £ Craig, 
1984). The most widely accepted methods appear to be the 
transformed item difficulty approach (Angoff £ Ford, 1973) , 
the item characteristic curve procedures (bra^a, 1978; 
Durovic,1375; Lord, 1977; Rudner, 1977), and the chi-square 
methods (Camilli, 1979; Scheuneman, 1975, 1979) which are 
similar in certain respects to the item characteristic curve 
method . Although comparative studies agree that the best 
procedure is the three-parameter item characteristic curve 
method, followed by the chi-square method, and then the 
transformed item difficulty method, most of these studies 
have noo included recently revised or new methods. 
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Recently, a number of new or modified methods have been 
proposed for detecting item bias when the minority sample is 
small. One of the hew methodologies is Linn and Harnisch's 

so called, pseudo-IRT( Z ) technique. Another hew 
methodology is Mellenberg's (1982), using a log linear model 
for three-way contingency table of test score categories, 
groups, and item responses . Another modified technique is 
Angof f * s revised transformed item difficulty procedure 
(1982). 

Shepard, Camilli, and Williams (1985) investigated Linn 
and Harhisch's pseudo-IRT( Z ) and Angof f*s revised 
transformed item difficulty method and compared them to 
other commonly used methods. Their study concluded that (1) 
the pseudo-IRT( Z ) is the method of choice when the sample 
size of the minority group has 300 or fewer members; (2) the 
pseudo-IRT(Z) method is highly correlated with the widely 
accepted three-parameter item characteristic curve method; 
(3) the pseudo-IRT( Z ) method is more accurate than a 
chi-square method at identifying bias; and (4) Angoff's 
revised transformed item difficulty procedure is 
considerably worse. 

However, additional studies are needed to confirm the 
above claims. The present study is Interested in evaluating 
the accuracy of Linn and Harhisch's method. A primary 
question which the present study will attempt to answer is: 
Did the pseudo-IRT ( Z ) method perform well in Shepard et 
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al.'s study (1985) because biased items were defined as such 
via large sample IRT analysis? It will also be possible to 
examine Camilli's chi-square method and Angoff's revised 
transformed item difficulty method. 

METHOD 



Three statistical item bias indices were computed in 
this research: Linn-Harnisch 's pseudo IRT(Z) index based on 
the three-parameter logistic model, a chi-square statistic 
resulting from Camilli's method, and a distance from a point 
to major axis resulting from Angoff's revised TIC method. 
Both signed and unsigned measures were computed. 

Paeudo-IRT( Z ) . Linn and Harnish (1981) proposed an 
alternative to the three-parameter item response theory 
method when the minority group sample size is too small to 
obtain stable estimates of the item parameters. 

This procedure estimates item discrimination, item 
difficulty, and guessing parameters based on the combined 
sample of minority and majority group examinees. Pj (0 ^ ) , 
the probability that examinee j will answer item i 
correctly, is obtained by the following formula: 

i 

p. (e .) = c, + (l - e. ) — — — — (l) 

1 f e 1 J 

, where 8^ = examinee ability level; 

oq = item discrimination parameter £ 

/3i = item difficulty parameter; and 

Cj = item guessing parameter. 
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Minority group examinees are divided into qulntiles on the 
basis of their estimated ability levels. A standardized 
difference score for examinees in quintile q is then 
computed as follows: 



1 U. (8.) - P. (8.) 

Z i« = -j I : 1 j (2) 

, where U, ( 8 . j = i_i£ person j answers item i 

3 correctly or 0 otherwise; _ 

P. (8.) = the estimated probability that 
1 3 person j answers item i correctly 

based on the combined group; 

0,(8.) = 1 - P (8 ); and 

N = the number of examinees in a 
' quintile q. 

Z. is an index of the degree to which the observed 
iq 

performance for members of quintile q is better or worse 
than predicted by the model in Equation (1). The following 
formula likewise is used to obtain a standardized difference 
for the complete minority group as an index of bias: 



The Zj. index will be 0 when an item is riot biased; while a 
large Zj. value (positive or negative) suggests the presence 
of bias. A positive sigh indicates that an item favors the 
minority group, because their actual performance is better 
than their expected performance. Actually, a sighed or 
unsigned index can be calculated. If the direction of 

bias is not consistent across the qulntiles, the sighed 
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index Is small. An unsigned Z^. index is simply the sum o 
the absolute values of NqZj- in Equation (3). 

Chi -square. Scheunernan (1975) said that "an item is 
unbiased if, for all individuals having the same score on < 
homogeneous subtest containing the item, the proportion of 
individuals getting the item correct is the same for each 
population group being considered". Baker (1981) pointed 
out several problems with Scheunernan ' s procedure which 
focused only on correct responses to ah item. Camilli 
(1979) modified the procedure to consider both correct and 
incorrect responses in an analysis. 

The chi-square procedure divides the total test score 
scale into discrete ability intervals or score levels. The 
present study used five test score intervals because Rudner 
et al. (1980) found that the chi-square technique using fivi 
intervals was as effective as the three-parameter item 
characteristic curve method under most of the investigated 
conditions. Observed frequencies are counted within each 
interval for each group with regard to the correct and 
incorrect answer respectively. Expected frequencies are 
computed by multiplying the proportion of examinees who 
respond correctly or incorrectly to the item within a total 
score interval by the total number of examinees within the 
each score interval for each group. 
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The observed frequencies are compared with the expected 
frequencies using a type of chi-square statistics that is 
sum of terms (0 - EJ 2 /B, across all intervals and groups. 
The full chi-square statistics for all responses is the sum 
of the chi-square value for correct and incorrect response. 
The full chi-square statistics is the index of bias. A 
large value indicates greater bias. A sighed measure can be 
computed by considering the direction of bias Within each 
interval and by attaching a positive sigh to the squared 
terms for the interval if the black group is favored and a 
negative sign if the white group is favored. 

Revised TID. Ahgoff and Ford C1973) originally 
proposed a method based on the traditional item difficulty 
index. If an item is relatively more difficult in one 
subpopulat ion than another, it is considered as biased. 

The item difficulty or P value (a proportion of subjects 
getting the item right) is computed separately for each 
group and for each item. The P value is transformed to a Z 
value which is the (i-Pjth percentile of the standard normal 
distribution. After the transformation, the plot of Z 
values tends to be linear. A delta value is calculated from 
the Z value to eliminate negative values using the linear 
transformation: A * 4Z + 13. A large 4 value indicates a 
difficult item. The pair of corresponding 4 values for each 
item is graphed on a two dimensional scatter plot for the 
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two groups. The plot of points appears in the form of an 
ellipse like the usual correlation diagram. The major axis 
line of the ellipse is drawn on the scatter diagram. A 
measure of bias is the length of the perpendicular line from 
a given point in the plot to the major axis. The formulas 
for determining the major axis and the perpendicular line 
are given in Angoff and Ford (1973). A large distance from 
the a given point for the item to the major axis indicates a 
more biased item. 

Hunter (1975), Lord (1977), and Angoff (1982) have 
pointed out theoretical limitations of the transformed item 
difficulty method (TID). If there is a large difference in 
average abilities between two groups, the TID method may 
indicate bias where none exists. Thus Angoff (1982) 
recently proposed dividing Z by the item-total correlation, 
the classical item discrimination, to obtain a Z' index. 

P values and Z values are computed exactly like those 
based on the original transformed item difficulty method. 
Next the item discrimination, the point biserial correlation 
between an item and total score in each group, is computed. 
The newly derived Z' value is then calculated by dividing 
the Z value by the item-test correlation. Z' is essentially 
equivalent to the difficulty index of latent trait 

theory if the normal ogive is fitted to the item response 
data (Baker, 1965). 
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Once Z' values are "known, the same procedure to plot 
delta values, to draw the major axis, and to measure the 
perpendicular distance from a point to the major axis Is 
like that of Angoff and Ford's original transformed item 
difficulty method. A large distance indicates a more biased 
item. Both sighed and unsigned indices were computed in the 
present study. A positive sign was attached to bias index 
if the black group was favored and a negative sign if the 
white group was favored • 

Data Source. 

Data for this investigation are from a study by 
Subkoviak, Mack, Irbnsbn, and Craig (1984). The purpose of 
using this data set, in which bias has been deliberately 
manipulated by including black vocabulary items, is to 
investigate how Linn and Harnish's pseudb-IRT( Z ) , fthgbff's 
revised transformed Item difficulty method, and Cami Ill's 
technique perform when the biased items are known 
externally. Specifically, these data consist of responses 
to 50 multiple-choice vocabulary test iteuns, including 10 
black siahg items which were intentionally written by a 
black author to be biased against whites^ independent of any 
statistical index of item bias. The other 40 items were 
drawn from the verbal section of the College Qualification 
Test which is an aptitude test constructed for college 
students. Four-option multiple-choice items were used, and 
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subjects were asked to choose the option which is a synonym 
for a given word. Black slang items were inserted randomly 
in each block of five items on the test. Directions for the 
test informed students that some of the words are 
standardized English, while others are slang. Further 
details of the data are provided by Subkoviak et al. (1984). 

There were 1,022 whites and 1,008 blacks. In this 
study, data for all 1,022 whites but only 300 blacks were 
analyzed; since the methods of interest here are especially 
recommended when the minority group is small. The 300 
blacks were selected randomly from the entire sample of 
1,008 blacks. 

Analysis 

For this study, the Pearson correlation coefficient was 
used to investigate the accuracy of each bias detection 
method. The point-biser ial correlation between the a priori 
bias index arid the detected bias measure for each method 
indicates how well each method detects the items 
intentionally written to be biased. The ten slang items 
were coded (1) and the forty standard vocabulary items were 
coded (0) as ah index of the a priori bias intentionally 
included in the test. In addition to this correlation, 
percentage agreement statistics were computed to determine 
the proportion of items which are classified as biased for a 
particular method. The agreement statistic (%) is the 
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proportion of correct classification; that is, the number of 
biased items detected which are black slang items plus the 
number of unbiased items which are hot black slang items, 
divided by the total number of items. 

The degree of correlation among bias detection methods 
was also computed. The correlation between the bias indices 
for two methods indicates how closely one method is 
associated with another method. 



RESULTS 

Detection of a Prior Bias 

Correlation The resulting item bias indices for ail 
methods are reported in Appendix A. Pearson correlations 
between the a priori bias index (zero-one coding) and the 
bias index for each procedure are shown in Table 1. 

Table 1 . Correlations between a Pr ior i Bias and 
Detected Bias 



Method Unsigned Measure S 1 gned Measure 

Pseudo-IRT(Z) .710 .762 

Camillas X 2 .691 .798 

Revised TID .345 .522 

12 
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Separate correlations with a priori bias were 

calculated for both signed and unsigned indices of bias. 

2 

The unsigned pseudo-IRT( Z ) and Cami Ill's % measures 
correlated . 715 and .691, respectively, with the known bias; 
whereas Angof f 's revised TID correlated .345. Similarly, 
the signed pseudo-IRT ( Z ) and the chi-square measures 
correlated .762 and .798 with a priori bias ; whereas the 
revised TID correlated .522. For the signed measures, which 
are more consistent with the a priori index, Cami Hi 1 s chi- 
square procedure has the highest correlation, followed 
closely by Liriri-Harriisch 1 s pseudo- IRT ( Z ) method, with 
Angof f's revised TID procedure last. 

Shepard et al.'s (1985} study produced similar results 
to that of this study. However in their study, the sighed 
pseudo-IRT( Z J produced the highest correlation with external 
bias, followed by Camilli's signed chi-square, arid then the 
delta plot procedure. In their study, the signed pseudo- 
IRT(Z) measure arid the signed chi-square index Were 
correlated .62 arid .59, respectively, with an a priori index 
based on ICC- 3 analysis of their data, which may have 
favored the IRT(Z) procedure. 

> 

This study confirmed that the correlation between a 
priori bias arid the signed measures for all methods were 
higher than the corresponding correlations based on unsigned 
measures (Subkoviak et al., 1984). This is rational because 
the standard deviation of signed measures is larger than 
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that of unsigned measures as indicated in Table 2 and 
because signed measures are directional like the a priori 
index used to compute the Pearson correlation coefficient 
for each method. 

Table 2. Descriptive Statistics of Bias Index for Each 
Method 

Unsigned Measure Sighed Measure 

IRT(Z) Camilli X* Rev. TIP IRT(Z) Camilli Rev. TIP 

Mean .232 60.9 25.6 .055 13.0 .8 

Stdev. .178 87.1 39.5 .266 106.0 47.2 

Percentage Agreement Additional information used to 

investigate the accuracy of each bias detection method is a 
percentage agreement or concordance between a priori bias 
and the bias detection by each method . I terns detected as 
biased by each method are compared to the known bias. 
Contingency Table 3 shows the proportion of items which are 
detected as biased from each method. 

As Table 3 indicates for the unsigned measure, the 
pseudo-IRT(Z) procedure and Camilli • s chi-square had 92% 
agreement with a priori bias, whereas the revised TIP had 
76%. It may be noted that the unsigned measure of the 
revised TIP method detected only four items as biased among 
the ten slang items . Thus it falsely identified six 
unbiased items as biased. 
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Table 3. Contingency Table and Percentage Agreement 6£ 
Each Method for Detecting the Imbedded Bias in the Test 
Unsigned Measure 

A Priori A Priori A Priori 

B NB B NB B NB 



B 

IRT(Z) _ 
NB 


8 


2 


10 


B 

X? 


8 


2 


10 


2 


38 


40 


NB 


2 


38 


40 


10 


40 






10 


40 





Agreement 
Percentage 92 % 

Phi .75 



92 % 
.75 



TID 



NB 



4 


6 


6 


34 



10 



40 

76 % 
.25 



10 
40 



Signed Measure 
A Priori 
B HB 



B 


9 


6 


1 

9 


IRT(Z) 

NB 


1 


40 


41 




10 


40 




Agreement 
Percentage 


98 


% 


Phi 




.937 



A Priori 
B NB 



B 



NB 



10 


0 


0 


40 



10 



10 40 

100 % 
1.0 



A Priori 
B NB 





8 


1 


Rev B 
TID 

NB 


2 


39 



10 40 

94 % 
.807 
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The signed measure of Camilli's chi-square method 
detected all black slang items as biased for 100 % 
agreement, while Linn and Harnisch's pseudo-IRT( Z ) method 
and Angoff 's revised transformed item difficulty method 
achieved 98% and 94% respectively. Signed measures appeared 
more accurate in detecting bias than unsigned measures. For 
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the present data, Camilli *s cfii-square appeared to be the 
best method, followed by Linn-Harnlsch 1 s Pseudo-IRT ( Z ) , and 
finally Angoff »s revised TID method. 

Agreement Among Methods Pearson correlations are one 

measure of how much one bias technique is related to that of 
another procedure • Inter cor relation among the bias indices 
of three methods are reported in Table 4. The correlations 
among item bias detection procedures Were separately 
computed for signed and unsigned bias measures. 

Table 4. Intercorrelatibns Among Bias Measures 

Unsigned Measure Sighed Measure 

IRT(Z) Camilli f s % 2 IRT(Z) Camllll's X* 

Camillas 7? .901 .893 

Revised TID .399 .354 .451 .497 



Camilli 's chi-square procedure is correlated highly 
with the pseudo-IRT(Z) method for both signed and unsigned 
measures (r=.901 f .893) because both procedures use the same 
type of definition of bias and quintile groupings. For both 
signed and unsigned measures, Angoff s revised TID procedure 
is associated weakly with the pseudo-IRT( Z ) method and 
Camilli 1 s chi-square procedure. These results confirm that 
the revised TID technique is not consistent with other bias 
methods (Shepard et ai., 1985). 
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CONCLUSION AND DISCUSSION 



Signed measures detected a priori bias more precisely 
than unsigned measures for each method. Furthermore Linn 
and Harnisch's pseudo-IRT( Z ) method and Camillas chi-square 
procedure were better at detecting a priori bias than 
Arigbff's revised TID method. For the intercorrelation among 
bias methods, Camilli v s chi-square technique was highly 
correlated with Linn and Harnisch's pseudo-IRT( Z ) method (r 
> . 893) . However, Angof f 's revised transformed item 
difficulty procedure is only weakly associated with the 
other two methods (r £ . 497) . 

This study supports Shepard et al.'s study showing that 
there is high agreement between the pseiido-IRT( Z ) and the 
simpler chi-square method and that Arigbff's revised 
transformed item difficulty is hot in close agreement with 
the other two. In other words, Arigoff's revised TID 
procedure does not generally appear to be a good method to 
detect item bias. In Arigbff's revised transformed item 
difficulty method, low test-item correlations resulted in 
extreme values of Z f and misleading bias indices in the 
present study (see Appendix B). 

This sf udy shows somewhat different results from 
Shepard et ai.'s study (1985). The current study suggests 
that Linn and Harnisch's pseudo-IRT( Z ) method may be 
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slightly less accurate than Camillas ehi-square procedure. 
There are several reasons for this. 

One reason is that it may not be appropriate to fit the 
three-parameter item response model to combined minority and 
majority data. The black slang items have low item 
discriminations because there are many whites and a small 
number of blacks in the data set. The estimates of item 
parameters may be influenced by t!ie target group combined 
with the majority group (see Appendix C) . 

Another reason may be violation of test unidimension- 
ality assumed by the Pseudo-IRT( Z ) method. There are 
thirteen principal components in the test having eigen 
valuer greater than one (see Appendix D); but the scree plot 
of these eigen values suggests two (or possibly three) 
factors in the test. Only two of the unrotated factors have 
many items whose factor loadings exceed .35 in absolute 
value. The first unrotated factor is related to thirty- 
three standard vocabulary items, the second factor is 
related to five black slang items. Only one or two items 
account for the remaining factors. After varimex rotation 
of the two factor solution, the primary factor might be 
called standard vocabulary and the second factor black slang 
based on loadings exceeding .35. 

Even though Linn and Rarnlsch's pseudo-IRT( Z J appears 
to be on theoretically sound ground because it retains the 
benefits of item response theory, it is questionable that 

18 



Linn arid Harnisch's ps£udo-IRT( Z ) method is necessary better 
than Camilli's chi -square method when the minority group is 
small . 
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Appendix A Item Bias Indices for All Items and All Methods 



Item USfZJ 



1 
2 

* 3 
4 
5 
6 

* 7 
0 
9 

10 
11 
12 
13 
14 

* 15 
16 
17 
18 
19 

* 20 
21 
22 
23 

* 24 
25 
26 

* 27 
28 
29 
30 
31 
32 
33 

* 34 
35 
36 

* 37 
38 
39 
4 0 
41 
42 

* 43 
44 
45 
46 
47 
48 
49 

* 50 



0 
0 

0. 



0, 
0, 



0 
0 
0 
0 

0, 
0. 
0. 
0. 



,097 
.13.1 
490 
0.250 
0.120 
0.195 
119 
240 
0.099 
0.153 
0. 115 
.139 
.081 
.092 
.230 
144 
,085 
090 
233 
0.751 
0.251 
0.103 
0.285 
0.444 
0.163 
0.140 
0.350 
0.170 
0.354 
0.107 
0.249 
0.115 
127 
299 
194 
0.220 
0.829 
0.142 
0.104 
0.141 
0.268 
0.193 
0.463 
0.3 44 
0.268 
0.087 
0.306 
0.180 
0 .202 
0.051 



0 
0 
0 



S(Z) 

0.001 
-0.300 
0. 490 
•0.13.3 
0.128 
0.168 
-0.112 
0.171 
0.099 
0. 009 
-0.020 
-0.009 
-0. 023 
-0.042 
0. 230 
0.068 
-0.080 
-0.039 
-0.164 
0.751 
-0.108 
-0.085 
0.125 
0.444 
-0.087 
-0.026 
0.322 
-0.170 
-0. 3*54 
0.008 
•0.13.1 
-0.115 
0.000 
0.299 
-0.016 
-0.220 



0 
0 
0 
0 

-b 
o. 
o. 



829 
086 

008 
003 

268 
393 
463 



-0.043 
-0.183 
-0. 023 
-0.306 
-0.180 
-0.202 
0.851 



US(£ a ) 

4 8.213 
61.789 
119 . 140 
24.475 
29. 281 
33.869 
19. 417 
18.091 
18.386 
19.964 
18.744 
15.521 
44.753 
45.069 
36. 161 
21 .424 
2.579 
24 .899 
29 . 399 
258 . 060 
46.225 
20. 400 
26.351 
207.350 
73.399 
4 .225 
123.900 
46.837 
47.198 
7.966 
23.841 
45. 085 
28.652 
76.942 
21.044 
37.822 
375.290 
16.689 
1.856 
23.551 
14.731 
1.8.339 
3.54 .900 
5.774 
22.069 
31.151 
54.816 
29.447 

i4i.iio 

429.400 



S(£ Z ) 

-48.042 
-61.714 
119.140 
-24 . 297 
-29- 281 
-33. 1 66 
17. 484 
-18.052 
13. 677 
-8.440 
-18.731 
-15. 521 
-44.694 
i -44.994 
36.1G1 
21. 396 
rl. 231 

-23.623 
-27. 417 
258.060 
-46 .046 
-18 -601 
-26.351 
207.350 
-73.361 
2.301 
123.630 
46.684 
-47.198 
-3. 358 
-•23.841 
41 .3 36 
-28.652 
76.942 
-20.890 
-37.822 
375.290 
-14.632 
1.827 
-10.900 
- 13.934 
-17.869 
154 .764 
0.334 
-22.227 
-30.173 
-54.816 
-29.372 
-141.084 
429 .400 



US ( D ) 

10.900 
3 2.226 
3. 547 
11. 431 
16.242 



7 
0 



091 
017 



21.935 
38. 119 
10.361 
11.516 
10.929 
11. 439 
11.328 
5.071 
10.882 
32.243 
22.496 
14.659 
100.220 
14.040 
20.964 
15.630 
14.708 
16.432 
13.565 
3.056 
13.186 
25.968 
11.998 
12.339 
12.493 
11.342 
230.3.07 
11.982 
7.860 
120.699 
9. 256 
10.078 
72. 563 
15.176 
20. 176 
3.717 
10.864 
11; 750 
10.781 
118.842 
16.261 
15.484 
44;176 



S(D) 

-10.900 
- 12. 226 
3. 547 
-11. 431 
-16. 242 
-7.091 
-0.017 
-21.935 
-38. 119 
-10. 361 
-11.516 
-10. 929 
-11. 439 
•-3 3 . 328 
-5.071 
30.882 
-32.243 
-22. 496 
-14.659 
100.220 
-14.040 
-20.964 
-15. 630 
14 .708 
-16.432 
-13. 565 
3.056 
-13.186 
-25.968 
-31.998 
-12.339 
-3.2.493 
-11 . 342 
230.107 
-11.982 
-7.860 
120.699 
-9 . 256 
-10.078 
-17. . 361 
-It, 176 
-20.176 
3.717 
-10.864 
-11.750 
-10.781 
118.842 
-16.261 
-15. 404 
44.176 



9 

ERIC 



* Black Slang Item 



2§ 



Appendix B Item Information for Computing the Item Bias Index 

of Angoff f s Revised Transformed Item Difficulty Method 
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Appendix C Estimates of Item Parameters Based oil the Three- 
parameter Logistic Model for Linh-Harhisch 1 s Pseudo- 



IRT(Z) Method 
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Appendix D Principal Component Analysis of the Test Item 



_ NO. of NO. of NO. of 
Un rotated Eigen Standard Slang Total 

Factor Value 'Items Items Items 



1 


8.044 


33 


1 


34 


2 


2.320 


2 


5 


7 


3 


1.961 


2 


1 


3 


4 


1.217 


1 


0 


1 


5 


1. 192 


0 


1 


1 


6 


1.169 


0 


0 


0 


7 


1.124 


0 


1 


1 


8 


1.109 


1 


1 


2 


9 


1.088 


1 


0 


1 


10 


1.075 


0 


b 


0 


11 


1.041 


2 


0 


2 


12 


i;bii 


1 


l 


2 


13 


1.001 


0 


0 


0 



* 

\ 

\ - _ 

_ , _ — - - ~TZ~~1 — ■ — K 

12 3 4 5 6 7 8 9 10 11 12 13 

Factor 



32 

ERIC 



7 .UUr 

6 .00 + 
Eigen - 

3 .00 + 
.00 + 



Unrotated Loadings for First Three Factors 
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